Constituent Boundary Parsing for Example-Based Machine Translation
نویسندگان
چکیده
This paper i)roposes an effective parsing nicthod for examlile-based machine transhltiOl~. In this method, an input string is parsed by the tOl)-down aplflication of linguistic patterns consisting o l variables and constituent boundaries. A constituent boundary is expressed by either a functional word or a l)art-of..speech bigram. When structural ambiguity occurs, the most plausible structure is selected usin b, tile total values of distance calculations in tile oxanll)le-basod Iraillework. Transfer-Driven Machine Translation (TDMT) achieves efficient aitd robust translation within the example-based framework by adopting this parsing method. Using bidirectional translation between Japanese and Vnglish> tile effectiveness of this method in TDMT is nlso shown. 1 I n t r o d u c t i o n I-xample-basod franieworks are increasingly being applied to machiilo translatioi/, since th0y c~.ill l)rovido efficient and robust processing (Nagao, 1984; Sate, 1991; Sumita, 1992; Furuse, 1992; Watanabe, 1992). However, in order to make tilt best use o1 the a(.lv:.lnlages of an example-based fl'amcwork, it is essential to effectively integrate an example-based method anti source language analysis. Unfortunately, whcll all exainl)lebased nletiiod ix combined with a SOUFC0 lnnguago analysis inelhod having cOlnl)lox l~r~illilliflr rules, pulling a heavy load eli translalion, the advai/lai;os of lhe example-based franiowork iilay l)e ruined. To achieve efficient and robnst processing by the exanii)lc-basod framework, a lot of sttldies have beell nlado for the pui])ose of combining source lal!gtiage analysis with all example-based method, lind of efficiently covering the analyzed source langilllge strtiCttlro by me;illS of trailsfcr knowledge (Grishman, 1992; Jollcs, 1992; McLean, 1992; Manlyama, 1992, 1993; Nirenburg 1993). One wily to reduce tilt load of source langua!,,c analysis ix to directly apply trallSl'cr knowledge to all input siring, which sinlultaneously executes both siruciinal parsing aiM transfer knowlc.dgo al)lHication through pattorll-il/atchii/g, l:'allerll-nlalchi~ig does liot rise grainillaticaI symbols such as "Notlil Pliraso", but uses surfi.ice words an(] non-granlmalical synlbols. Therefore, in patlern-matching, rule coml)otition is reduced, and linguistic structure is expressed in a simpler manner thall ill gramnmr-based parsing. Thus, pattern-nlatcifing achieves efficient 1)arsing. It is also useful in treating spoken language, which sometimes deviates from convcntion:ll grammar, while grammar-based p,'lrsing has difficulty treating ilnreslricle(l spoken I[ingllll,ge. This pal)Or proposes a constituom boundary parsing method based on paltorn-niatching, and shows its effeclivonoss for spoken langnago translation within the exaniple-I)asod framework. In otlr parsing method, aii inl)Ut string .is applied l inguistic patterns e×pressing some linguistic constitticnts and their bonnds-lrios, in a top-down f:.tshion. \Vhon structural anlbiguity occurs, tile most phlusi/)lo structure is selected rising the total vahios of dislanco calculations in t i l t example-based lrs-Illiowork. Shico the description of a linguistic ps-ittern is sinlplo, it is easy to update by adding f0etlback. A constiLuonl boundary ixusing method using nuitual i l l foi i l lat ion i~ l)roposed in (M,'lgerlflan 1990). This method accouilts for the unrestricted lls-ltLlra] langtlage and is efficient, l lowever, it tends to be illacctirate> and difficult, to ad(l feedback to, since it completely depends on st'ltistical information withoul, resort to a linguistic viewpoint. On the cont,ary> in order to achieve accurate parsing and Iransb'ition, our conslituent boundary parsing method implicitly incorporates grammatical information into p'ltterns, e.g. constituent boundary description by a i)art-of-sl)eech bigrani, and classification of i)ailerns according lo linb, uislic levels such s.ls simple sentence ,tlrld tlOtHI l)hrase. Tla l ls fer-Or ivel l Maehil lo TranslatiOll (T I )MT) ([:tlrtiso> 1992, 1994) uses tile COl/Stil.llont botlndary 1)a~sint ,, liielhod l)l'eSollto(l in this paper, as an alternative to glamliiar-based ali:.ilysis, aiKI lliakos the i)ost ilSe of the ex:lmplo-based framework. A bidirectional translation syslcnl between Jap,'lnesc lind English for dialogue sentences concerning international conference regislralions has been illlplenlented (Sobashima, 1994). l~xperimonts with the systonl have shown ollr parsing iiicthod I() t~ effcctive. Section 2 defines patterns expressed by variables and con.<;liluont boundaries. Section 3 OXl)lains a method for derivin{, possible English structures. Soelion 4 explain'4 structural disanibi,gnaliOti using tlislanco calculations in Iho o×anilflo-b,'lsed framework. Section 5 exphlins an example of Japanese sent0nee analysis using our consliluont boundary parsing method> and Section 6
منابع مشابه
Constituent lloundary Parsing for Exanll)lo-lkised Maclhine Tr,'inslation
This paper i)roposes an effective parsing nicthod for examlile-based machine transhltiOl~. In this method, an input string is parsed by the tOl)-down aplflication of linguistic patterns consisting o l variables and constituent boundaries. A constituent boundary is expressed by either a functional word or a l)art-of..speech bigram. When structural ambiguity occurs, the most plausible structure i...
متن کاملConstituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation
We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art English-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outperforms a syntax-based translation system that incorporates a phrase translation model, a hierarchi...
متن کاملLearning Translation Boundaries for Phrase-Based Decoding
Constrained decoding is of great importance not only for speed but also for translation quality. Previous efforts explore soft syntactic constraints which are based on constituent boundaries deduced from parse trees of the source language. We present a new framework to establish soft constraints based on a more natural alternative: translation boundary rather than constituent boundary. We propo...
متن کاملEncoder-Decoder Shift-Reduce Syntactic Parsing
Encoder-decoder neural networks have been used for many NLP tasks, such as neural machine translation. They have also been applied to constituent parsing by using bracketed tree structures as a target language, translating input sentences into syntactic trees. A more commonly used method to linearize syntactic trees is the shift-reduce system, which uses a sequence of transition-actions to buil...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994